class: center, middle, inverse, title-slide # rOpenSci, peer review, statistical software, and testing ## SatRdays
Neuchatel
anywhere ### Mark Padgham
rOpenSci
Münster, Germany ### Saturday 14th March, 2020 --- class: left, middle, inverse .left-column[
mpadge
ropensci ] .right-column[
bikesRdata
.small[mark@ropensci.org]<br><br>
mpadge.github.io ] .box-bottom[ slides at <br> [https://github.com/mpadge/satRday-neuchatel-2020](https://github.com/mpadge/satRday-neuchatel-2020) ] --- [](https://ropensci.org) --- ## rOpenSci packages ```r url <- "https://ropensci.github.io/roregistry/registry.json" x <- jsonlite::fromJSON(url)$packages names (x) ``` ``` ## [1] "name" "description" "details" ## [4] "maintainer" "keywords" "github" ## [7] "status" "onboarding" "on_cran" ## [10] "on_bioc" "url" "ropensci_category" ``` ```r nrow (x) ``` ``` ## [1] 386 ``` --- ## rOpenSci package categories ```r table (x$ropensci_category) ``` ``` ## category n ## altmetrics 2 ## data-access 125 ## data-analysis 4 ## data-extraction 6 ## data-publication 5 ## data-tools 25 ## data-visualization 5 ## databases 9 ## geospatial 22 ## http-tools 17 ## image-processing 5 ## literature 30 ## scalereprod 32 ## security 8 ## taxonomy 10 ``` --- ## rOpenSci package authors ```r head (sort (table (x$maintainer), decreasing = TRUE), 20) ``` ``` ## ## Scott Chamberlain Jeroen Ooms Carl Boettiger ## 123 20 14 ## Maëlle Salmon Lincoln Mullen Karthik Ram ## 11 8 7 ## Dom Bennett Ildiko Czeller Richèl J.C. Bilderbeek ## 5 5 5 ## Adam H. Sparks Ju Yeong Kim Rich FitzJohn ## 4 4 4 ## Andy South Mark Padgham Matthew Leonawicz ## 3 3 3 ## Ben Raymond Bob Rudis Claudia Vitolo ## 2 2 2 ## Daniel Münch Edmund Hart ## 2 2 ``` --- class: center <!-- --> .large[ [github.com/ropensci](https://github.com/ropensci) <br> [ropensci.org](https://ropensci.org) ] --- class: center <!-- --> .large[ What does community engagement<br>with rOpenSci packages look like? ] --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% # Analyses of git repositories - *Non-Primary Contributions* = code contributions (numbers of commits) by other than designated primary maintainer <br><br> - 241 rOpenSci repositories <br> - 54 RStudio repositories (just for comparison) --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ### Non-Primary Contributions to repositories  --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ### Package prominence  --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ### Prominence and community  --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ### Non-Primary Contributions to repositories (revisited)  --- class: center <!-- --> .large[ What does community engagement<br>with rOpenSci packages look like? ] --- class: center <!-- --> .large[ What can you do to engage with rOpenSci? ] --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What can you do to engage with rOpenSci? ### As a package user - Give feedback on [discuss.ropensci.org](https://discuss.ropensci.org) or via github issues - Submit a Use Case to [discuss.ropensci.org](https://discuss.ropensci.org) - Participate in regular [Community Calls](https://ropensci.org/commcalls) - Ping @ropensci on twitter --- along with package authors --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What can you do to engage with rOpenSci? ### As a developer - Read the [rOpenSci Developer Guide](https://devguide.ropensci.org/contributingguide.html) --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What can you do to engage with rOpenSci? > 15.1 Why contribute to rOpenSci packages? > > In general, as explained by Kara Woo in her talk at the CascadiaR conference, contributing to R packages allows you to make things work the way you want (by adding some functionality to your favorite package), can lead to opportunities and allows you to learn about package development. > > ... we strive to make contributing a good experience... we are creating social infrastructure through a welcoming and diverse community --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What can you do to engage with rOpenSci? ### As a code contributor - Read the [rOpenSci Developer Guide](https://devguide.ropensci.org) - Boost the statistics for non-primary<br>contributions to our software - Use packages, contribute to or open<br>issues, make pull requests ### As a developer - Develop your own package - Open a pre-submission enquiry (= issue) on [github.com/ropensci/software-review](https://github.com/ropensci/software-review) --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci Software Categories ``` ## category n ## 1 altmetrics 2 ## 2 data-access 125 ## 3 data-analysis 4 ## 4 data-extraction 6 ## 5 data-publication 5 ## 6 data-tools 25 ## 7 data-visualization 5 ## 8 databases 9 ## 9 geospatial 22 ## 10 http-tools 17 ## 11 image-processing 5 ## 12 literature 30 ## 13 scalereprod 32 ## 14 security 8 ## 15 taxonomy 10 ``` -- ... but [R is a language for statistics](https://github.com/ropensci/software-review/issues/331), right? --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% class: center, middle <!-- --> --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci peer-review of statistical software ### Why Not? Because it's really difficult ### Why? Because its really important -- - The [R Validation Hub](https://www.pharmar.org/about/) are doing it, but exclusively<br>for the bio-pharmaceutical industry. - We will be (co-)developing a generalised methodology -- - The [R Validation Hub](https://www.pharmar.org/about/) has 44 organisations yet relatively little money - rOpenSci has a board of 6 members, around 1.5 full-time staff, and some money --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci peer-review of statistical software > We’ll be working with the new board and the broader statistical software community to develop a set of agreed-upon standards for statistical package implementation and testing, then launching a new peer-review process and testing tools. (rOpenSci blog 15 July 2019) --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci peer-review of statistical software ### ~~Why Not? Because~~ it's really difficult -- .left-column[ - Bayesian & Monte Carlo - Dimensionality & Feature Reduction - Machine Learning - Regression, Splines, & Interpolation - Statistical Indices and Scores - Visualisation - Probability Distributions ] .right-column[ - Wrapper Packages - Categorical Variables - Networks - Exploratory Data Analysis (EDA) - Survival Analysis - Workflow Software - Summary Statistics - Spatial Analysis - Educational Software ] --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci peer-review of statistical software > We’ll be working with the new board and the broader statistical software community to develop a set of agreed-upon standards for statistical package implementation and testing, then launching a new peer-review process and testing tools. (rOpenSci blog 15 July 2019) --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci peer-review of statistical software "agreed-upon standards for statistical package implementation and testing" .left-column[ - Bayesian & Monte Carlo - Dimensionality & Feature Reduction - Machine Learning - Regression, Splines, & Interpolation - Statistical Indices and Scores - Visualisation - Probability Distributions ] .right-column[ - Wrapper Packages - Categorical Variables - Networks - Exploratory Data Analysis (EDA) - Survival Analysis - Workflow Software - Summary Statistics - Spatial Analysis - Educational Software ] --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## rOpenSci peer-review of statistical software "agreed-upon standards for ~~statistical package implementation and~~ testing" .left-column[ - Bayesian & Monte Carlo - Dimensionality & Feature Reduction - Machine Learning - Regression, Splines, & Interpolation - Statistical Indices and Scores - Visualisation - Probability Distributions ] .right-column[ - Wrapper Packages - Categorical Variables - Networks - Exploratory Data Analysis (EDA) - Survival Analysis - Workflow Software - Summary Statistics - Spatial Analysis - Educational Software ] --- background-image: url(images/ropensci-bg-dark.png) background-size: contain background-position: 0% 50% class: inverse # TESTING --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What is testing? - Unit Testing - Functional Testing - Integration Testing - Regression Testing - Lots of other types --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What is testing? - Unit Testing What is a "Unit" in R code? Or in an R package? --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What is testing? - Alternative approaches ### Concrete Testing Testing of concrete inputs and outputs ### Property-based Testing Testing functional responses based on the general *properties* of their inputs and outputs --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What is testing? - Alternative approaches ### Concrete Testing - inbuilt in `rust` (the benchmark) - python via `pytest` and lots of other packages - R via `testthat` and a few other packages ### Property-based Testing - `python` via [`hypothesis`](https://hypothesis.works/) (the benchmark) - `rust` via [`quickcheck`](https://github.com/BurntSushi/quickcheck), [`proptest`](https://lib.rs/crates/proptest), and others --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ### Typical `roxygen` function documentation lines ```r #' @param x The first input #' @param y The second input #' @param z The third and last input #' @return The output value #' @export f <- function(x, y, z) { # function definition } ``` -- `roxygen2` is extensible, and the [`roxytest` package](https://github.com/mikldk/roxytest) enables concrete tests to be specified directly in function documentation. --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ### Typical `roxygen` function documentation<br>plus property-based testing lines ```r #' @param x The first input #' @param y The second input #' @param z The third and last input #' @return The output value #' @export #' #' @given x integer #' @given y numeric #' @given z character #' @expect is.integer(f(x, y, z)) #' @expect length(f(x, y, z)) == 1 f <- function(x, y, z) { # function definition } ``` --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ```r f <- function(x, y, z) { # function definition } imax <- .Machine$integer.max # something like 2,147,483,647 nmax <- .Machine$double.xmax # something around 10 ^ 308 for (i in seq(ntrials)) { tryCatch ( res <- f(runif(1, min = -imax, max = imax), rnorm(1, min = -nmax, max = nmax), z = "<string>"), error = function(e) e ) } ``` Property-based testing tests responses<br>to ranges of input values --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ```r f <- function(x, y, z) { # function definition } imax <- .Machine$integer.max # something like 2,147,483,647 nmax <- .Machine$double.xmax # something around 10 ^ 308 for (i in seq(ntrials)) { tryCatch ( res <- f(runif(runif(1e6), min = -imax, max = imax), rnorm(runif(1e6), min = -nmax, max = nmax), z = "<string>"), error = function(e) e ) } ``` Property-based testing tests responses<br>to structures of input values --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ```r #' @given x integer #' @given y numeric #' @given z character #' @given length(x) <= 10 #' @given res = f(x, y, z) #' @expect res is silent f <- function(x, y, z) { # function definition } ``` --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ```r #' @given x integer #' @given y numeric #' @given z character #' @given length(x) <= 10 #' @given res = f(x, y, z) #' @expect res is silent #' #' @given length(x) == 10 #' @given res = f(x, y, z) #' @expect res is silent #' @report res is error f <- function(x, y, z) { # function definition } ``` - Bug reports use the same grammar as tests --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ```r #' @given x integer #' @given y numeric #' @given z character #' @given length(x) <= 10 #' @given res = f(x, y, z) #' @expect res is silent #' #' @given viz = TRUE #' @report f(x, y, z, viz) is error #' @request f(x, y, z, viz) produces interactive graphical output f <- function(x, y, z, int) { # function definition } ``` - Feature requests use the same grammar as tests -- - Feature requests are pull requests are direct code contributions --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ### Non-Primary Contributions to repositories (revisited)  --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? ```r #' @given x integer #' @given y numeric #' @given z character #' @given length(x) <= 10 #' @given res = f(x, y, z) #' @expect res is silent #' #' @given viz = TRUE #' @report f(x, y, z, viz) is error #' @request f(x, y, z, viz) produces interactive graphical output f <- function(x, y, z, int) { # function definition } ``` - Feature requests use the same grammar as tests - Feature requests are pull requests are direct code contributions --- background-image: url(images/ropensci-bg.svg) background-size: contain background-position: 0% 50% ## What might property-based testing look like? - [cucumber.io](https://cucumber.io) - Lots of prior work - Efforts underway to incorporate/adapt within python - R is an opportunity in waiting -- ## Property-based testing can build community --- background-image: url(images/ropensci-bg-dark.png) background-size: contain background-position: 0% 50% class: left, middle, inverse ## Please Help! Please Contribute! .left-column[
mpadge
ropensci ] .right-column[
bikesRdata
.small[mark@ropensci.org]<br><br>
mpadge.github.io ] .box-bottom[ slides at <br> [https://github.com/mpadge/satRday-neuchatel-2020](https://github.com/mpadge/satRday-neuchatel-2020) ]